AITopics | global policy

Collaborating Authors

global policy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Guided Policy Search via Approximate Mirror Descent

Neural Information Processing SystemsMay-1-2026, 06:07:20 GMT

Guided policy search algorithms can be used to optimize complex nonlinear policies, such as deep neural networks, without directly computing policy gradients in the high-dimensional parameter space. Instead, these methods use supervised learning to train the policy to mimic a "teacher" algorithm, such as a trajectory optimizer or a trajectory-centric reinforcement learning method. Guided policy search methods provide asymptotic local convergence guarantees by construction, but it is not clear how much the policy improves within a small, finite number of iterations. We show that guided policy search algorithms can be interpreted as an approximate variant of mirror descent, where the projection onto the constraint manifold is not exact. We derive a new guided policy search algorithm that is simpler and provides appealing improvement and convergence guarantees in simplified convex and linear settings, and show that in the more general nonlinear setting, the error in the projection step can be bounded. We provide empirical results on several simulated robotic navigation and manipulation tasks that show that our method is stable and achieves similar or better performance when compared to prior guided policy search methods, with a simpler formulation and fewer hyperparameters.

local policy, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

CaMP: Causal Multi-policy Planning for Interactive Navigation in Multi-room Scenes Xiaohan Wang

Neural Information Processing SystemsFeb-9-2026, 21:44:37 GMT

Visual navigation has been widely studied under the assumption that there may be several clear routes to reach the goal.

artificial intelligence, machine learning, obstacle, (13 more...)

Neural Information Processing Systems

Country:

Asia > China > Shaanxi Province > Xi'an (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Israel (0.04)
Asia > China > Beijing > Beijing (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Artificial Intelligence > Vision (0.70)
Information Technology > Artificial Intelligence > Robots (0.68)

Add feedback

FOVA: Offline Federated Reinforcement Learning with Mixed-Quality Data

Qiao, Nan, Yue, Sheng, Ren, Ju, Zhang, Yaoxue

arXiv.org Artificial IntelligenceDec-3-2025

Offline Federated Reinforcement Learning (FRL), a marriage of federated learning and offline reinforcement learning, has attracted increasing interest recently. Albeit with some advancement, we find that the performance of most existing offline FRL methods drops dramatically when provided with mixed-quality data, that is, the logging behaviors (offline data) are collected by policies with varying qualities across clients. To overcome this limitation, this paper introduces a new vote-based offline FRL framework, named FOVA. It exploits a \emph{vote mechanism} to identify high-return actions during local policy evaluation, alleviating the negative effect of low-quality behaviors from diverse local learning policies. Besides, building on advantage-weighted regression (AWR), we construct consistent local and global training objectives, significantly enhancing the efficiency and stability of FOVA. Further, we conduct an extensive theoretical analysis and rigorously show that the policy learned by FOVA enjoys strict policy improvement over the behavioral policy. Extensive experiments corroborate the significant performance gains of our proposed algorithm over existing baselines on widely used benchmarks.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2512.0235

Country: Asia > China (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (0.67)
Education (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Novel MDP Decomposition Framework for Scalable UAV Mission Planning in Complex and Uncertain Environments

Quamar, Md Muzakkir, Nasir, Ali, ELFerik, Sami

arXiv.org Artificial IntelligenceDec-2-2025

This paper presents a scalable and fault-tolerant framework for unmanned aerial vehicle (UAV) mission management in complex and uncertain environments. The proposed approach addresses the computational bottleneck inherent in solving large-scale Markov Decision Processes (MDPs) by introducing a two-stage decomposition strategy. In the first stage, a factor-based algorithm partitions the global MDP into smaller, goal-specific sub-MDPs by leveraging domain-specific features such as goal priority, fault states, spatial layout, and energy constraints. In the second stage, a priority-based recombination algorithm solves each sub-MDP independently and integrates the results into a unified global policy using a meta-policy for conflict resolution. Importantly, we present a theoretical analysis showing that, under mild probabilistic independence assumptions, the combined policy is provably equivalent to the optimal global MDP policy. Our work advances artificial intelligence (AI) decision scalability by decomposing large MDPs into tractable subproblems with provable global equivalence. The proposed decomposition framework enhances the scalability of Markov Decision Processes, a cornerstone of sequential decision-making in artificial intelligence, enabling real-time policy updates for complex mission environments. Extensive simulations validate the effectiveness of our method, demonstrating orders-of-magnitude reduction in computation time without sacrificing mission reliability or policy optimality. The proposed framework establishes a practical and robust foundation for scalable decision-making in real-time UAV mission execution.

artificial intelligence, machine learning, optimization problem, (19 more...)

arXiv.org Artificial Intelligence

2512.00838

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.68)

Industry:

Transportation (1.00)
Energy (1.00)
Aerospace & Defense (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Interpreting LLM-as-a-Judge Policies via Verifiable Global Explanations

Gajcin, Jasmina, Miehling, Erik, Nair, Rahul, Daly, Elizabeth, Marinescu, Radu, Tirupathi, Seshu

arXiv.org Artificial IntelligenceOct-10-2025

Using LLMs to evaluate text, that is, LLM-as-a-judge, is increasingly being used at scale to augment or even replace human annotations. As such, it is imperative that we understand the potential biases and risks of doing so. In this work, we propose an approach for extracting high-level concept-based global policies from LLM-as-a-Judge. Our approach consists of two algorithms: 1) CLoVE (Contrastive Local Verifiable Explanations), which generates verifiable, concept-based, contrastive local explanations and 2) GloVE (Global Verifiable Explanations), which uses iterative clustering, summarization and verification to condense local rules into a global policy. We evaluate GloVE on seven standard benchmarking datasets for content harm detection. We find that the extracted global policies are highly faithful to decisions of the LLM-as-a-Judge. Additionally, we evaluated the robustness of global policies to text perturbations and adversarial attacks. Finally, we conducted a user study to evaluate user understanding and satisfaction with global policies.

explanation, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

2510.0812

Country: Europe (0.14)

Genre: Research Report > New Finding (0.68)

Industry:

Information Technology > Security & Privacy (0.49)
Law (0.47)
Government > Military (0.35)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

CaMP: Causal Multi-policy Planning for Interactive Navigation in Multi-room Scenes Xiaohan Wang

Neural Information Processing SystemsOct-8-2025, 10:12:40 GMT

Visual navigation has been widely studied under the assumption that there may be several clear routes to reach the goal.

agent, navigation, obstacle, (11 more...)

Neural Information Processing Systems

Country:

Asia > China > Shaanxi Province > Xi'an (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Israel (0.04)
Asia > China > Beijing > Beijing (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Artificial Intelligence > Vision (0.70)
Information Technology > Artificial Intelligence > Robots (0.68)

Add feedback

Federated Reinforcement Learning in Heterogeneous Environments

Hwang, Ukjo, Hong, Songnam

arXiv.org Artificial IntelligenceJul-22-2025

Abstract--We investigate a Federated Reinforcement Learning with Environment Heterogeneity (FRL-EH) framework, where local environments exhibit statistical heterogeneity . Within this framework, agents collaboratively learn a global policy by aggregating their collective experiences while preserving the privacy of their local trajectories. T o better reflect real-world scenarios, we introduce a robust FRL-EH framework by presenting a novel global objective function. This function is specifically designed to optimize a global policy that ensures robust performance across heterogeneous local environments and their plausible perturbations. We propose a tabular FRL algorithm named FedRQ and theoretically prove its asymptotic convergence to an optimal policy for the global objective function. Furthermore, we extend FedRQ to environments with continuous state space through the use of expectile loss, addressing the key challenge of minimizing a value function over a continuous subset of the state space. Reinforcement Learning (RL) has demonstrated remarkable efficacy in tackling complex challenges across various domains, including gaming, robotics, intelligent networks, manufacturing, and finance [1]-[3]. However, the practical implementation of RL algorithms often encounters persistent obstacles, particularly the scarcity of training samples, especially in large action and state spaces.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2507.14487

Genre: Research Report > New Finding (0.93)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Average-Reward Maximum Entropy Reinforcement Learning for Global Policy in Double Pendulum Tasks

Choe, Jean Seong Bjorn, Choi, Bumkyu, Kim, Jong-kook

arXiv.org Artificial IntelligenceMay-13-2025

-- This report presents our reinforcement learning-based approach for the swing-up and stabilisation tasks of the acrobot and pendubot, tailored specifcially to the updated guidelines of the 3rd AI Olympics at ICRA 2025. Building upon our previously developed A verage-Reward Entropy Advantage Policy Optimization (AR-EAPO) algorithm, we refined our solution to effectively address the new competition scenarios and evaluation metrics. Extensive simulations validate that our controller robustly manages these revised tasks, demonstrating adaptability and effectiveness within the updated framework. Building upon prior competitions at IJCAI 2023 [3] and IROS 2024 [4], the current edition places particular emphasis on global policy robustness, requiring solutions for reliable swing-up stabilisation tasks from arbitrary initial configurations under significantly increased external disturbances. The competition maintains its use of two different configurations: the acrobot, characterised by an inactive shoulder joint, and the pendubot, with an inactive elbow joint.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2505.07516

Country: Asia > South Korea (0.15)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.42)

Add feedback

Provably Robust Federated Reinforcement Learning

Fang, Minghong, Wang, Xilong, Gong, Neil Zhenqiang

arXiv.org Artificial IntelligenceFeb-12-2025

Federated reinforcement learning (FRL) allows agents to jointly learn a global decision-making policy under the guidance of a central server. While FRL has advantages, its decentralized design makes it prone to poisoning attacks. To mitigate this, Byzantine-robust aggregation techniques tailored for FRL have been introduced. Yet, in our work, we reveal that these current Byzantine-robust techniques are not immune to our newly introduced Normalized attack. Distinct from previous attacks that targeted enlarging the distance of policy updates before and after an attack, our Normalized attack emphasizes on maximizing the angle of deviation between these updates. To counter these threats, we develop an ensemble FRL approach that is provably secure against both known and our newly proposed attacks. Our ensemble method involves training multiple global policies, where each is learnt by a group of agents using any foundational aggregation rule. These well-trained global policies then individually predict the action for a specific test state. The ultimate action is chosen based on a majority vote for discrete action systems or the geometric median for continuous ones. Our experimental results across different settings show that the Normalized attack can greatly disrupt non-ensemble Byzantine-robust methods, and our ensemble approach offers substantial resistance against poisoning attacks.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2502.08123

Country:

Oceania > Australia > New South Wales > Sydney (0.05)
Africa > Mali (0.04)
North America > United States > North Carolina > Durham County > Durham (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

TriHelper: Zero-Shot Object Navigation with Dynamic Assistance

Zhang, Lingfeng, Zhang, Qiang, Wang, Hao, Xiao, Erjia, Jiang, Zixuan, Chen, Honglei, Xu, Renjing

arXiv.org Artificial IntelligenceMar-22-2024

Navigating toward specific objects in unknown environments without additional training, known as Zero-Shot object navigation, poses a significant challenge in the field of robotics, which demands high levels of auxiliary information and strategic planning. Traditional works have focused on holistic solutions, overlooking the specific challenges agents encounter during navigation such as collision, low exploration efficiency, and misidentification of targets. To address these challenges, our work proposes TriHelper, a novel framework designed to assist agents dynamically through three primary navigation challenges: collision, exploration, and detection. Specifically, our framework consists of three innovative components: (i) Collision Helper, (ii) Exploration Helper, and (iii) Detection Helper. These components work collaboratively to solve these challenges throughout the navigation process. Experiments on the Habitat-Matterport 3D (HM3D) and Gibson datasets demonstrate that TriHelper significantly outperforms all existing baseline methods in Zero-Shot object navigation, showcasing superior success rates and exploration efficiency. Our ablation studies further underscore the effectiveness of each helper in addressing their respective challenges, notably enhancing the agent's navigation capabilities. By proposing TriHelper, we offer a fresh perspective on advancing the object navigation task, paving the way for future research in the domain of Embodied AI and visual-based navigation.

agent, navigation, objectnav, (12 more...)

arXiv.org Artificial Intelligence

2403.15223

Country:

Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback